Technical report on the Automatic Detection of Paedophile Queries

نویسندگان

  • Matthieu Latapy
  • Raphaël Fournier
چکیده

Filtering or identifying paedophile queries is a key issue for law enforcement and search engines. However, these queries are in general mixed with a huge amount of other queries. Moreover, little is known on their characteristics. We address here these two issues in order to design the first tool for automatic detection of paedophile queries. Using domain expertise, we select some paedophile queries in a set of hundreds of millions of queries entered in a general public P2P system. We extend this set by manually inspecting the queries it contains and the words composing them. We then design a tool which tags any query as paedophile or not. We run it on our dataset and evaluate its performances by submitting appropriate samples to external experts. This assessment shows that the tool performs very well. Going further, the assessment makes it possible to estimate precisely and rigorously its error rates, which we compute and provide.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Technical report on Maps of paedophile activity

As policy-making and law enforcement institutions generally operate at the national level, or at least at a regional level (Europe for instance), we studied geolocated recordings available in a large dataset obtained by a measurement of keyword-based queries submitted to a large P2P server. We observed that the fractions of paedophile queries may be orders of magnitude larger in some countries ...

متن کامل

Quantification of Paedophile Activity in a Large P2P System Measurement and Analysis of P2P Activity Against Paedophile Content project http://antipaedo.lip6.fr

In this work, we explore two basic but crucial statistics: the fraction of paedophile queries entered by users of a large P2P file exchange system, and the fraction of involved users. In order to do so, we carefully inspect two huge datasets of more than one hundred million queries recorded in two very different contexts. We then use a state-of-the-art tool for automatic detection of paedophile...

متن کامل

Dynamics of Paedophile Keywords in eDonkey Queries

This technical report synthesizes the results of the analysis of paedophile keywords’ dynamics in two sets of eDonkey queries, collected during several months in 2007 and 2009 respectively. The goal of this work is to study the evolution of paedophile keywords’ frequency and popularity over several weeks (i.e. within a given dataset), as well as between the two different datasets. Moreover, spe...

متن کامل

Measurement and Analysis of P2P Activity Against Paedophile Content

Peer-to-peer (P2P) systems are nowadays widely used to exchange files, and it is acknowledged that they host much paedophile activity. However, current knowledge of this specific activity remains very limited, and almost no tool exist for user protection. Likewise, tools and knowledge for policy making and law enforcement are far from sufficient. The goal of the Measurement and Analysis of P2P ...

متن کامل

First Report on Paedophile Keywords Observed in eDonkey

This report presents our first analysis results on paedophile keywords observed in exchanges between eDonkey clients and their server. We first describe our dataset and the messages studied in this context. General statistics on the number of queries, filenames, clients and keywords are provided, before focusing on paedophile keywords appearing in user queries and/or in filenames. Statistical a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010